News
When AI learns from AI, bias travels too, study warns

When AI learns from AI, bias travels too, study warns

Chethan Kumar / TNN / Apr 17, 2026, 22:47 IST

Comments

Text Size

Small
Medium
Large

When AI learns from AI, bias travels too, study warns

Researchers want closer scrutiny of training data & how models are related to one anotherBENGALURU: AI systems aren’t just learning tasks from one another, they may also be passing along hidden biases and behavioural tendencies, even when those signals aren’t visible in the data, a new study has found.Published in Nature, the research was led by Alex Cloud and Minh Le of Anthropic, along with colleagues from Truthful AI, University of California Berkeley, Oxford Martin AI Governance Initiative, Alignment Research Center, and Warsaw University of Technology. The team was supervised by Owain Evans of Truthful AI and UC Berkeley, who proposed the study.The research shows that newer AI models can pick up traits from older ones simply by training on their outputs. This happens even when the training material appears neutral and unrelated to those traits.The process, known as distillation, is widely used to build smaller or more efficient AI models. A “student” system is trained on responses generated by a “teacher” model. What the study reveals is that this exchange carries more than just useful knowledge — and that the risk is not as universal as it might first appear.In controlled experiments, the researchers created teacher models with specific tendencies, such as favouring a particular animal.

These models were then asked to produce datasets stripped of any obvious clues — plain number sequences, for instance. Yet when student models were trained on this data, they began displaying the same preferences, despite no direct reference to them in the numbers.There is, however, an important catch. The transfer worked reliably only when the teacher and student were built on the same underlying design. When the team tested mismatched models — systems from different families — the effect largely disappeared. This suggests the phenomenon is tied to shared internal structures, not to some general contamination that spreads between any two AI systems.The implications become sharper when the transferred traits are harmful. When a teacher model was tuned to behave in unsafe ways, the student adopted similar patterns. In some cases it generated responses encouraging violence or illegal acts, even though the training data had been carefully filtered to remove problematic content. Researchers recorded such responses in roughly one in ten outputs, compared to almost none in standard models. What makes the finding difficult to manage is that transmission does not rely on obvious meaning. Models can absorb patterns embedded in data that appear meaningless to humans, whether numbers, programming code, or reasoning traces. The team also tested whether simply showing a model the same data, rather than training it on that data, would produce the same effect. It did not. The bias transfer appears to be something that happens during the training process itself, not something a model can simply read off the page.Jacob Hilton of the Alignment Research Center further showed, through a mathematical proof, that this tendency is not a quirk of their particular experiments. It appears to be a fundamental property of how neural networks learn when built from the same starting point, meaning it could surface in many real-world settings, not just in a laboratory.The findings arrive at a time when AI development increasingly depends on machine-generated data. Companies often use outputs from existing systems to train newer versions, raising the possibility that hidden tendencies could quietly travel forward. Researchers note, though, that their experiments used simpler conditions than those found in frontier AI development, and questions remain about which traits can be transmitted, under what conditions, and whether the effect can be reversed.Current safety checks focus largely on visible behaviour — what a model says, and whether it appears to act appropriately. The study suggests this may not be enough. The team argues for closer scrutiny of how training data is produced and how models are related to one another.

About the AuthorChethan Kumar

Chethan Kumar is a Senior Assistant Editor with the Times of India. Aside from specialising in Space & Science, he has reported extensively on varied topics, with special focus on defence, policy and data stories. He has covered multiple elections, too. As a young democracy grows out of adolescence, Chethan feels, there are reels of tales emerging which need to be captured. To do this, he alternates between the mundane goings-on of the Common Man and the wonder-filled worlds of scientists and scamsters, politicians and soldiers. In a career spanning nearly 18 years, he has reported from multiple datelines — Houston, Florida, Kochi, Hyderabad, Chennai, Sriharikota (AP), NH-1 (J&K Highway), New Delhi, Ahmedabad, Raichur, Bhatkal, Mysuru, Chamarajanagar, to name a few — but is based out of Bengaluru, India’s science capital that also hosts the ISRO HQ.

End of Article